Continuous nonlinear dimensionality reduction by kernel Eigenmaps

نویسنده

  • Matthew Brand
چکیده

We equate nonlinear dimensionality reduction (NLDR) to graph embedding with side information about the vertices, and derive a solution to either problem in the form of a kernel-based mixture of affine maps from the ambient space to the target space. Unlike most spectral NLDR methods, the central eigenproblem can be made relatively small, and the result is a continuous mapping defined over the entire space, not just the datapoints. A demonstration is made to visualizing the distribution of word usages (as a proxy to word meanings) in a sample of the machine learning literature. First circulated fall 2002. This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Information Technology Center America; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Information Technology Center America. All rights reserved. Copyright c ©Mitsubishi Electric Information Technology Center America, 2003 201 Broadway, Cambridge, Massachusetts 02139 Proceedings, International Joint Conference on Artificial Intelligence, IJCAI-2003. Continuous nonlinear dimensionality reduction by kernel eigenmaps Matthew Brand Mitsubishi Electric Research Laboratories Cambridge, MA 02460 USA Abstract We equate nonlinear dimensionality reduction (NLDR) to graph embedding with side information about the vertices, and derive a solution to either problem in the form of a kernel-based mixture of affine maps from the ambient space to the target space. Unlike most spectral NLDR methods, the central eigenproblem can be made relatively small, and the result is a continuous mapping defined over the entire space, not just the datapoints. A demonstration is made to visualizing the distribution of word usages (as a proxy to word meanings) in a sample of the machine learning literature. 1 Background: Graph embeddings Consider a connected graph with weighted undirected edges specified by edge matrix W. Let Wi j = Wji be the positive edge weight between connected vertices i and j, zero otherwise. Let D . = diag(W1) be a diagonal matrix where Dii = ∑ j Wi j, the cumulative edge weights into vertex i. The following points are well known or easily derived in spectral graph theory [Fiedler, 1975; Chung, 1997]: 1. The generalized eigenvalue decomposition (EVD) WV = DVΛ (1) has real eigenvectors V . = [v1, · · · ,vN ] and eigenvalues Λ . = diag([λ1 ≥ λ2 ≥ ·· · ≥ λN ]). 2. Premultiplying equation (1) by D−1 makes the generalized eigenproblem into a stochastic eigenproblem (D−1W)V = VΛ, (2) where D−1W is a stochastic transition matrix having nonnegative rows that sum to one. The largest eigenvalue of equation (1) is therefore stochastic (λ1 = 1) and its paired eigenvector is uniform (v1 = 1/ √ N). 3. Expanding and collecting terms in Wi j reveals the geometric meaning of the eigenvalues: λk = 1−∑i j(vik− v jk)Wi j/2. (3) The d eigenvectors paired to eigenvalues λ2 through λd+1 therefore give an embedding of the vertices in Rd with minimal distortion vis-a-vis the weights, in the sense that a larger Wi j stipulates a shorter embedding distance. Formally, the embedding Y1:d . = [v2, · · · ,vd+1] = arg max YDY>=I trace(YWY>) (4) minimizes the distortion d− trace(YWY>) = d+1 ∑ k=2 (1−λk) = d+1 ∑ k=2 ∑ i j (vik− v jk)Wi j/2 (5) for any integer d ∈ [1,N]. The norm constraint YDY> = I sets the scale of the embedding and causes vertices of high cumulative weight to be embedded nearer to the origin. 4. Y can be rigidly rotated in Rd without changing its distortion. The distortion measure is also invariant to rigid translations, but the eigenproblem is not, thus there is an unwanted degree of freedom (DOF) in the solution. Due to stochasticity, this DOF is isolated in the uniform eigenvector v1, which is suppressed from the embedding without error (because 1− λ1 = 0). Adding tv1 to Y rigidly translates the embedding by t ∈ Rd . 5. Premultiplying by V> and rearranging equates equation 1 to the EVD of the graph Laplacian D−W: V>(D−W)V = I−Λ. (6) 6. Premultiplying by D−1/2 connects equation 1 to the (symmetric) EVD of the normalized Laplacian: (D−1/2WD−1/2)V′ = V′Λ (7) with V′ . = D1/2V. In summary: Equation 1 gives an optimal embedding of a graph in Rd via eigenvectors v2 · · ·vd+1; eigenvalue λ1is stochastic and the corresponding eigenvector v1 is uniform; this is an important property of the EVD solution because it isolates the problem’s unwanted translational degree of freedom in a single eigenvector, leaving the remaining eigenvectors unpolluted. Many embedding algorithms can be derived from this analysis, including the Fiedler vector [Fiedler, 1975], locally linear embeddings (LLE) [Roweis and Saul, 2000], and Laplacian eigenmaps [Belkin and Niyogi, 2002]. For example, direct solution of equation 1 gives the Laplacian eigenmap; as a historical note, the symmetrized formulation was proposed by Fiedler in the 1970s and has been used for heuristic graph layout since the 1980s [Mohar, 1991]. 2 Transformational embeddings Now consider a more general problem: We are given some information about the vertices in a matrix Z . = [z1, · · · ,zN ] ∈ Rd×N , whose columns are generated by applying a vectorvalued function z(·)→ z ∈ Rd to each vertex of the graph. We seek a linear operator which transforms Z to the optimal graph embedding: G(Z)→Y. We will call this the “transformational embedding,” to distinguish it from the “direct embedding” discussed above. A natural candidate for the algebraic statement of the transformational embedding problem is the generalized EVD (ZWZ>)V = (ZDZ>)VΛ, (8) because setting Y = V>Z makes this equivalent to the original direct embedding problem. Again, there is an equivalent symmetric eigenproblem: Make Cholesky1 decomposition R>R← ZDZ> into upper-triangular R ∈ Rd×d and let B . = (R−>ZWZ>R−1) ∈ Rd×d . (9) Then BV′ = V′Λ (10) with V′ . = RV, V = R−1V′. (11) This gives an embedding [v2,v3, · · ·]>Z, and a computational advantage: If Z ∈ Rd×N is a short matrix (d N), the original N×N eigenproblem can be reduced to a very small d×d problem, and the matrix multiplications also scale as O(d2N) rather than O(N3), due to the sparsity of W and D. 2.1 Correcting problematic eigenstructure It is generally the case that Y> 6∈ range(Z>)—there is no linear combination of the rows of Z giving Y, so the desired linear mapping G(Z)→ Y does not exist. Equations 8–11 give the optimal least-squares approximation G(Z) = V>Z ≈ Y. This approximation can have a serious flaw: If 1 6∈ range(Z>) then the first eigenvector v1 is not uniform; it cannot be discarded as the unwanted translational DOF. Worse, all the other eigenvectors will be variously contaminated by the unwanted DOF, resulting in an embedding polluted with artifacts. For this reason, we call direct solution of equation 8 a raw approximation. Our options for remedy are limited to those that modify the row-space of Z to reintroduce the uniform eigenvector. For reasons that will become obvious below, we will restrict ourselves to operations that can be applied to any column of Z without knowing any other column. The simplest such operation is to append a uniform row to Z, so that zi → [zi ,1]>. This makes the relation between Z 1Any gram-like factorization will work. For example, given EVD AΩA>← ZDZ>, R =Ω1/2A>. The Cholesky is especially attractive for its numerical stability, sparsity, and easy invertibility. and Y affine and guarantees that v1 Z is uniform, but it can also force the eigenvectors to model additional variance that is not part of the problem. Working backward from the desiderata that the leading column of V>Z should be uniform, let K . = diag(v1 Z) −1 such that ZK is a modified representation of the vertices with values of z(·) reweighted on a per-vertex basis: zi→ zi/(v1 zi). Clearly (V>ZK)> has a uniform first column, since each row is divided by its first element. It follows immediately that the related eigenproblem (ZKWKZ>)V′′ = (ZKDKZ>)V′′Λ′′ (12) is stochastic, and Y′′ . = [v′′ 2 ,v ′′ 3 , · · ·]>ZK is an embedding with the unwanted translational degree of freedom totally removed. Note that the raw and stochastic approximations are orthogonal (under metric D): Y′′DY′′> is a diagonal matrix; the other methods are not. It should be noted that—when scaled to have equal norm trace(YDY>)— none of these approximations has uniformly superior distortion scores; but in Monte Carlo trials with random graphs, we find a clear ordering from lowest to highest distortion: reweighted, affine, stochastic, raw (see figure 1). reweighted affine stochastic raw 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 av g no rm ed d is ta nc e to o pt im al ( 10 6 tr ia ls ) Graph embedding distortions modification to eigenproblem Figure 1: Comparison of methods for modifying the rowspace of Z. The graph shows distortion from the optimal embedding, averaged over 106 trials with 50-node matrices having random edge weights and random Z ∈ R4×50. The raw approximation is suboptimal because information about the d-dimensional embedding is spread over d + 1 eigenvectors, no subset of which is optimal. The stochastic approximation is also suboptimal—it optimizes a different measure implied by equation 12. In practice, when computing embeddings of graphs whose embedding structure is known a priori, we find that the reweighted and stochastic approximations give results that are clearly very similar, and superior to the other approximations. The need for any such correction stems from the fact that—the literatures of spectral graph theory and NLDR notwithstanding—equation 1 is not a completely correct statement of the embedding problem. We will show in a forthcoming paper that, as a statement of the embedding problem, equation 1 is both algebraically underconstrained and numerically ill-conditioned. In particular, point #2 is not strictly true: The stochastic eigenvalue is not always paired to a uniform eigenvector. This leads to pathologies that can ruin the embedding, whether obtained from the basic or derived formulations. NLDR algorithms that can be derived from equation 1 (e.g., [Roweis and Saul, 2000; Belkin and Niyogi, 2002; Teh and Roweis, 2003]) do not remediate the problem. A forthcoming paper makes a full analysis of these issues, identifies the correct problem statements for both equations 1 & 8, and gives closed-form optimal solutions to both problems. The approximation methods discussed in this section are still useful in that they are faster and give reasonably high-quality embeddings. For the NLDR method and datasets considered below, the result of the reweighted approximation is almost numerically indistinguishable from the optimal embedding, and requires substantially less calculation. The reweighting method can also be justified as a Padé approximation of the optimal solution. 3 Nonlinear dimensionality reduction Let X . = [x1, · · · ,xN ] ∈ RD be a set of points sampled from a low-dimensional manifold embedded in a high-dimensional ambient space. A reduced-dimension embedding Y . = [y1, · · ·yN ] ∈ Rd with d < D ≤ N is a set of low-dimensional points with the same local neighborhood structure. We desire instead a mapping G : RD → Rd , which will generalize the correspondence to the whole continuum, with reasonable interpolation and extrapolation to be expected in the neighborhood of the data. Spectral methods for NLDR typically require solution of many and/or very large eigenvalue or generalized eigenvalue problems [Kruskal and Wish, 1978; Kambhatla and Leen, 1997; Tenenbaum et al., 2000; Roweis and Saul, 2000; Belkin and Niyogi, 2002], and with the exception of [Teh and Roweis, 2003; Brand, 2003], offer embeddings of points rather than mappings between spaces. Here we show how a leverage the transformational embedding of section 2 into a continuous NLDR algorithm, specifically a kernel-based mixture of affine maps from the ambient space to the target space. To do so, we must show how the edge weight matrix W and vertex matrix Z are specified. Let Wi j . = f (xi,x j) iff xi and x j satisfy some locality criterion, e.g., ‖xi− x j‖ < ε, otherwise Wi j = 0. As stated above, an embedding Y of X should satisfy Y = argmin Y ∑i6= j ‖yi−y j‖Wi j (13) where larger Wi j penalize large distances between yi and y j. How should Wi j be computed? f is a measure of similarity: The graph-theoretic literature usually takes f (·, ·) = 1, while NLDR methods typically take f (xi,x j) ∝ exp(−‖xi− x j‖/2σ) to be a Gaussian kernel, on analogy to heat diffusion models [Belkin and Niyogi, 2002]. The uninformative setting Wi j = 1 is only usable when there is a very large number of points (and edges), so that connectivity information alone suffices to determine metric properties of the embedding. The Gaussian setting has a complementary weakness: It can be very sensitive to small variations in distance to neighbors (that may be introduced by the curvature of the data manifold or measurement noise in the ambient space). f should be monotonically decreasing, relatively insensitive to noise (d f should be small), and it should lead to exact reconstructions of data sampled from manifolds that are already flat. Straightforward calculus shows that equation 13 has the desired minimum when f (xi,x j) ∝ ‖xi−x j‖, or more generally, the multiplicative inverse of whatever distance measure is appropriate in the ambient space2. (By contrast, the LLE weightings are not correlated with distances.) To make the problem scale invariant, we scale W such that its largest nonzero off-diagonal value is 1 (consequently d f ≤ 1 everywhere f is computed). Let us now situate some Gaussian kernels pk(x) . = N (x|μk,Σk) on the manifold. In this paper, we will take a random subset of data points as kernel centers, and set all Σk = σ2I; these kernels are radial basis functions. Let vector zik . = [ Ki(xi−μk) 1 ] pk(xi) ∑k pk(xi) (14) be the kth local homogeneous coordinate of xi scaled by the posterior of the kth kernel. Ki is an optional local dimensionality-reducing linear projection. Let representation vector z(xi) . = [⇓k zik] = [zi1, · · · ,ziK ]> (15) be the vertical concatenation of all such local coordinate vectors. Collect all such column vectors into a basis matrix Z . = [z(x1), · · · ,z(xN)]. To summarize thus far, our goal now is to find a linear transform yi . = G(z(xi)) of the basis (kernel-weighted coordinates) that is maximally consistent with our local distance constraints, specifically Y = [y1, · · ·yN ] = argmin Y ∑ i j ‖yi−y j‖ ‖xi−x j‖ (16) This is isomorphic to the graph embeddings of section 2; the methods developed there apply directly to W and Z. The continuous mapping from ambient to embedding space immediatly follows from the continuity and smoothness of z(·): G(x) = Gz(x), where the EVD determines the transformation G . = [v2, · · · ,vd+1] of the continuous kernel representation defined over the entire ambient space: z(x) . = [ ⇓k [ Ki(x−μk) 1 ] pk(x) ∑k pk(x) ] . (17) 2Proof: Consider three points {x1 = 0, x2 = λ, x3 = 1} on a 1D manifold. What similarity measure Wab = f (‖xa− xb‖) causes the distortion (y2− y1)W12 +(y2− y3)W23 to have a global minimum at y2 = λ? Without loss of generality, we fix the global location and scale of the embedding by fixing the endpoints: {y1 = 0, y3 = 1}. Solving for the unique zero of the distortion’s first derivative, we obtain the optimum at y2 = W23/(W12 +W23). Since this is a harmonic relation, the unique continuous satisficing measure is Wab = (‖xa − xb‖). This sets W12 = 1/λ and W23 = 1/(1− λ); some simple algebra confirms that indeed y2 = λ at the optimum. The induction to multiple points in multiple dimensions is direct. As a matter of numerical prudence, we recommend using the reweighted approximation: G(x) = Gz(x) v1 z(x) . (18) At first blush, it would seem that reweighting should not be necessary: By construction, 1 ∈ range(Z>), thus v1 z(x)— and the denominator—should be uniform at the datapoints. However, as mentioned above, even when the algebra predicts this structure, numerical eigensolvers may not find it. To obtain an approximate inverse mapping, we map the means and covariances of each kernel pk(·) into the target space to obtain kernels pk(y) . = N (y|μ′,Σ′) there. Then, breaking G = [G1, · · · ,GK ] into blocks corresponding to each kernel, take the Moore-Penrose pseudo-inverse of each, and set G+ . = [G1 , · · · ,G + k ]. If using the reweighted map, the approximate inverse map is G+(y)≈G+z+(y) · (v1 G+z+(y)), (19) where z+(y) . = [ ⇓k [ y−μk 1 ] pk(y) ∑k pk(y) ] . 4 Illustrative example We will use a variant of the Figure 2: The swiss roll. “swiss roll”, a standard test manifold in the NLDR community, to illustrate the arguments and methods developed in this paper. We sampled a twisted version of the manifold regularly on a 30× 30 grid and added a small amount of Gaussian isotropic noise. Figure 2 shows the ideal R2 parameterization and two views of the ambient R3 embedding. Points are shown joined into a line to aid visual interpretation of the embeddings. All experiments in this section use a W matrix that was generated using the 12 nearest neighbors to each point and the inverse distance function. The Laplacian eigenmap Figure 3: Laplacian eigenmap embedding. embedding (figure 3) shows the embedding specified by the W matrix. Note that it exhibits some folding at the corners and top and bottom edges, due partly to problems with the uniform eigenvector and exacerbated by the fact that spectral embeddings tend to compress near the boundaries. The Laplacian eigenmap requires solution of a large 900× 900 eigenproblem, and offers no mapping off the points. Kernel eigenmaps will be approximations to this embedding. We now show some kernel eigenmaps computed using the transformational embedding of section 2. All embedding methods are given the same inputs. Figure 4 shows a raw Figure 4: Kernel eigenmap embedding, raw result. kernel eigenmap embedding computed using a basis (Z matrix) created from 64 Gaussian unit-σ kernels placed on random points. This required solving a much more manageable 256 × 256 eigenproblem. 100 trials were performed with different sets of randomly placed kernels. In all trials, the reweighted and stochastic maps gave the best reconstructions, while the raw and affine maps exhibited substantial folding at the edges and corners of the embedding. Figure 5 shows a Figure 5: Kernel eigenmap embedding, reweighted and regularized results. reweighted kernel eigenmap computed from the same W and Z as figures 3 & 4. The result is smoother and actually exhibits less folding than the original Laplacian eigenmap. The problem can be regularized by putting positive mass on the diagonal of W (e.g., W→W+ I), thereby making the recovered kernel eigenmap more isometric (bottom figure 5). This regularization is appropriate when it is believed that all neighborhoods are roughly the same size. The recently proposed Locality Preserving Projection (LPP) [He and Niyogi, 2002], is essentially the raw approximation (direct solution of equation 8) with Wi j = e−‖xi−x j‖ 2/t and Z = X, thereby giving a linear projection from the ambient space to the target space that best preserves local relationships. LPP is admirably simple, Figure 6: LPP embedding and our affine upgrade. but it can be shown that the affine approximation from section 2 will always have less distortion. LPP can also suffer from loss of the uniform eigenvector. Figure 6 shows embeddings of the swiss roll produced by LPP and by an affine modification of it that is equivalent to our method with a trivial single uniformdensity kernel. Upgrading LPP to an affine projection captures more of the data’s structure. Even so, there is no affine “view” of this manifold that avoids folding. 5 Visualizing word usages In statistical analyses of natural language, similar usage patterns for two words are taken to indicate that they have similar meanings or strongly related meanings. Latent semantic analysis (LSA) is a linear dimensionality reduction of a termdocument co-occurence matrix. The principal components of this matrix give an embedding in which similarly used words are similarly located. Literally, co-location is a proxy for collocation (the propensity of words to be used together) and synonymy. We may expect that the kernel eigenmap offers a more powerful nonlinear analysis: The NIPS12 corpus3 features a matrix counting occurrences of 13000+ words in 1700+ documents. We modeled the first 1000 words and the last 200 documents in the matrix. This roughly corresponds to one year’s papers, a reasonable “snapshot” of the ever-changing terminology of the field. We stemmed the words and combined counts for the same roots, then determined distance between two word roots as the cosines of the angles between their log-domaintransformed occurrence vectors (xi j→ log2(1+ xi j)). The W matrix was generated by adding an edge from each word to its 30 closest neighbors in cosine-space. The representation Z was made using 4 random words as kernel centers. Figure 7 discusses the resulting 2D embedding, in which technical terms are clearly grouped by field and many of the more common English words are tightly clustered by common semantics. The first two LSA dimensions (also shown in figure 7) of the same data are reveal significantly less semantic structure. 6 Discussion The kernel eigenmap generates continuous nonlinear mapping functions for dimensionality reduction and manifold reconstruction. Suitable choices of kernels can reproduce the behavior of several other NLDR methods. One could put a kernel at every local group of points, perform local dimensionality reduction (e.g., a PCA) at each kernel, and thereby obtain from equations 8 and 17 an NLDR algorithm much like charting [Brand, 2003] or automatic alignment [Teh and Roweis, 2003]. Or, as in the demonstrations above, the kernel eigenmap can simultaneously determine the local dimensionality reductions and their global merger. The kernel eigenmap typically substitutes a small dense EVD for the the large sparse EVD of graph embedding problems. In the sparse case, a specialized power method can compute the desired eigenvectors in significantly less than the O(N3) time required for a full EVD. In the kernel setting, similar efficiencies apply because both W and Z are typically sparse, allowing fast construction of the reduced EVD problem ZWZ>; this too is amenable to fast power methods. Of course, the most important efficiency of the kernel method is its ability to embed new points quickly via the function 3Courtesy S. Roweis, available from the U. Toronto website. G(x)—there is no need to compute a new global embedding or revise the EVD. The reweighting scheme, although theoretically mooted by our subsequent discovery of a better problem formulation and closed-form solution, is still practically viable as a fast approximation for large problems, and as a post-conditioning step for unavoidable numerical error of any NLDR algorithm based on eigenvalue decompositions. In this paper we have used random kernels. There are numerous avenues to discovering stronger methods by investigating placement and tuning of the kernels, stability of the embedding and its topological structure, and sample complexity. In short, all the issues that proved fertile ground for research in classification and regression can be studied anew in the context of estimating the geometry and topology of manifolds. References [Belkin and Niyogi, 2002] Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Technical Report TR-2002-01, University of Chicago Computer Science, 2002. [Brand, 2003] Matthew Brand. Charting a manifold. In Proc. NIPS-15, 2003. [Chung, 1997] Fan R.K. Chung. Spectral graph theory, volume 92 of CBMS Regional Conference Series in Mathematics. American Mathematical Society, 1997. [Fiedler, 1975] Miroslav Fiedler. A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory. Czech. Math. Journal, 25:619–633, 1975. [He and Niyogi, 2002] Xiafei He and Partha Niyogi. Locality preserving projections. Technical Report TR-2002-09, University of Chicago Computer Science, October 2002. [Kambhatla and Leen, 1997] N. Kambhatla and Todd Leen. Dimensionality reduction by local principal component analysis. Neural Computation, 9, 1997. [Kruskal and Wish, 1978] J. B. Kruskal and M. Wish. Multidimensional Scaling. Sage Publications, Beverly Hills, CA, 1978. [Mohar, 1991] B. Mohar. The laplacian spectrum of graphs. In Y. Alavi, editor, Graph Theory, Combinatorics and Applications, pages 871–898. J. Wiley, New York, 1991. [Roweis and Saul, 2000] Sam T. Roweis and Lawrence K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290:2323–2326, December 22 2000. [Teh and Roweis, 2003] Yee Whye Teh and Sam T. Roweis. Automatic alignment of hidden representations. In Proc. NIPS-15, 2003. [Tenenbaum et al., 2000] Joshua B. Tenenbaum, Vin de Silva, and John C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290:2319–2323, December 22 2000.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Kernel Laplacian Eigenmaps for Visualization of Non-vectorial Data

In this paper, we propose the Kernel Laplacian Eigenmaps for nonlinear dimensionality reduction. This method can be extended to any structured input beyond the usual vectorial data, enabling the visualization of a wider range of data in low dimension once suitable kernels are defined. Comparison with related methods based on MNIST handwritten digits data set supported the claim of our approach....

متن کامل

Discussion of "Spectral Dimensionality Reduction via Maximum Entropy"

Since the introduction of LLE (Roweis and Saul, 2000) and Isomap (Tenenbaum et al., 2000), a large number of non-linear dimensionality reduction techniques (manifold learners) have been proposed. Many of these non-linear techniques can be viewed as instantiations of Kernel PCA; they employ a cleverly designed kernel matrix that preserves local data structure in the “feature space” (Bengio et al...

متن کامل

Stratified Structure of Laplacian Eigenmaps Embedding

We construct a locality preserving weight matrix for Laplacian eigenmaps algorithm used in dimension reduction. Our point cloud data is sampled from a low dimensional stratified space embedded in a higher dimension. Specifically, we use tools developed in local homology, persistence homology for kernel and cokernels to infer a weight matrix which captures neighborhood relations among points in ...

متن کامل

Adaptive KNN Classification Based on Laplacian Eigenmaps and Kernel Mixtures

K Nearest Neighbor (kNN) is one of the most popular machine learning techniques, but it often fails to work well with inappropriate choice of distance metric or due to the presence of a lot of irrelated features. Linear and non-linear feature transformation methods have been applied to extract classrelevant information to improve kNN classification. In this paper, I describe kNN classification ...

متن کامل

Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering

Several unsupervised learning algorithms based on an eigendecomposition provide either an embedding or a clustering only for given training points, with no straightforward extension for out-of-sample examples short of recomputing eigenvectors. This paper provides a unified framework for extending Local Linear Embedding (LLE), Isomap, Laplacian Eigenmaps, Multi-Dimensional Scaling (for dimension...

متن کامل

Regression on manifolds using kernel dimension reduction

We study the problem of discovering a manifold that best preserves information relevant to a nonlinear regression. Solving this problem involves extending and uniting two threads of research. On the one hand, the literature on sufficient dimension reduction has focused on methods for finding the best linear subspace for nonlinear regression; we extend this to manifolds. On the other hand, the l...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003